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PREFACE 

« 

• This report was prepared with support from the National Institute 
of Education and the Lilly Endowment^ Inc. The purpose of the research 
was to examine methodologies for modeling students' choices among 
higher education institutions. 

A statistical technique caTled "conditional logit alialysis" has 
recently been popularized; its applications include exactly the problem 
studyied here. The authors review these applications and point out 
certain weaknesses inherent in the approach. They then offer an alter- 
native approach, based on the use of Bayes' Theorem, which is easier 
to use, more flexible, and less expensive "tQ apply. In empirical tests, 
it was also obse'rved to have greater predictive power that! conditional 
logit analysis. • . ■ . 

The authors are grateful to Rand colleagues Bryan C. Elllckson, 
Gus Haggstrom, and ^ohn J. McCall for valuable comments on an 
earlier draft of this report. 
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SUMMARY 



This study revisits a problem that Jias received considerable 
attention in recent years: modeling students' choices among institu- 

'tions of higher education. We pffer a methodological approach to the 
problem which obviates some of the technical and methodological 'diffi- 
culties encountered in previous studies, where the primary tool of 
analysis, has *been "conditional logit.i/ We demonstrate our api)roach 

♦with data from the SCOPE 1966 survey of high school seniors and com- 
pare our results to those obtained in other analyses of the SCOPE data. 

We regard the SCOPE data as drawn from a population described by 
a joint density P(i,j), wher.e i identifies a particular student and 
j a particular institution. The problem is to obtain a parametric 
model for P(j I i) , the probability that student i chooses irlstltution j. 
The cdhditional logit approach uses a maximum likelihood technique to 
estimate P(j|i) directly, whereas we suggest a two^stage procedure in" 
which the parameters of P(i|j) are estimated via ordinary linear re- 
gression, then Bayes' Theorem is used to obtain P(j|i). The regression 
models describe student ability, income, and distance from home as func- 
tions of the characteristics of chosen institutions. In using Bayes' 

'"Theorem, we assume that the prior probability of choosing a given in- ^ 
stitution depends on its size. 

o 

We apply our model to the problem of predictiiig the distribution - 
of students among certain homogeneous categories of .institutions. We 
find that the deviations between predicted and actual distributions 
are quite small and that the predictive power of our model is substan- 
tially greater than that of alternative models which used the condi- 
tional logit methodology to analyze the same data set. , 

Conditional logit studies of individual choice behavior in a vari- 
ety of areas have recently appeared in the literature. Our results 
suggest that the Bayesian formulation is a viable alternative. .Questions 
of predictive power aside, the Bayesian methodology is easier to use., 
offers much greater flexibility, and is less expensive to apply. Thus, 
even in cases where theoretical considerations might suggest the alter- 
native .approach, the Bayesian methodology would be a useful adjunct in 
the exploratory stages of research. 
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INTRODUCTION 



This study revisits a problem that has received considerable 
attention in recent years: modeling students' choices among institu- 
tions of higher education. Our primary objective is ta offer a method- 
ological approach to the- problem which obviates some o'f the technical 
and methodological difficulties encountered in previous studies- We 
demonstrate the approach with data from the SCOPE 1966 survey of high 
school seniors,"'' and compare our results to those obtained in other 
analyses of the SCOPE data. 

Our point of departure is the recent work of Kohn, Manski, and 

2 

Mundel- [1] and Radner and Miller [2,3]. Both used a statistical esti- 
mation technique called "conditional logit" to analyze students' choices, 

3 

given their characteristics. The conditional logit approach overcomes 
many of the limitations of the other available approaches • But it has 
important limitations of its own. 

The technique has very demanding data requirements. The analyst 
must know the entire set of alternatives each student considered in 
making bis choice. Second, the computational problems involved in max- 
imizing the logit likelihood function are so severe as to limit both 
the flexibility one has in choosing the functional form of the relation- 
ships and the amount of exploratory ^nalys>is one can do. It is barely 




1 ^ 

School to College; Opportunities for Postsecondary Education. 
This survey, conducted by The Center for Research -and Development, 
University of California, Berkeley, is described- in Sec. II. 
2 

Radner and Miller [2] present the analysis. Many of the technical 
details, however, are reserved to a separately published technical.^ "sup- 
plement — Miller and Radner [3]. For simplicity- in discussion, we will 
consistently refer to their joint work as Radner and MiJ.ler, using 
bracketed reference numbers to distinguish between the two. 

3 • 

The conditional logit approach has been recently popularized by 

McFadden [4,5]. It is now being applied in a tfroad range of studies of 
individual decisions including choices among transportation modes [6] 
and occupations [7,8]. ~~- 

4 ^ 

Radner and Miller [2] provide a detailed critique of the approaches 
used in earlier studies and outline the advantages of the conditional 
logit technique. 
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feasiWe to write down a single model specified by theory and then to 
estimate parameters. It is not feasible to admit that the theory is. 
weak, and thus that alternative formulations of independent variables, 
goodness of fit tests, analyses of residuals, etc., should be tried. 

'We view these difficulties as motivation for our own approach, 
which begins with two basic observations. First, if one is to predict 
a student's choice, given his characteristics, it seems reasoiiabie that 
one should be able to say something about his characteristics , given 
his choice. Second there exists a readily applicable method to trans- ^ 
late statements about characteristics, given choice, to statements 
about choice, given charal:teristics^ — BayesV Theorem. 

Thus, we regard the SCOPE data. as drawn , from a population described 
by a joint density P(i,j), where i identifies a particular student and 
j a particular institution. The problem is to obtain a parametric model 
for P(j[i), the probability that student i chooses institution j . The 
conditionat logit approach Uses a maximum likelihood technique to esti- 
mate P(j|i) directly, whereas we suggest a two-stage procedure in which 
the parameters of P(i|j) are estimated via ordinary linear regression, 
then Bayes'. Theorem is used to obtain P (j | i) . The regression models 
describe student ability, income, and distance from home as functions 
of the characteristics of chosen institutions. In using Bayers' Theorem, 
we assume that the prior probability of choosing a given institution 
depends upon its* size. 

Section II reviews t;he conditional, logit approach, describes the 
data available from the SCOPE 1966 survey, and reviews the Kohn, Manski, 
and Mundel and the Radner and Miller studies, focusing on the problems 
they encountered in using the conditional logit approach. Our ^approach 
is described in Sec. III. In Sec. IV, we describe our empirical re- 
sults in deriving the parameters of P(i]j). Section V provides an 
investigation of the predictive power of our approach as compared to 
that of Radner and Miller. Some concluding remarks are presented in 
Sec, VI. 
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'''^ II. 'the conditijonal logit approach 



In this section, after briefly reviewing the formal structure of 
the conditiona[l logit approach,, we summarize the Radner and Miller and 
the Kohn, Manski, and Mundel studies, describing their data bases, 
indicating the variables they used, and* giving their procedures for 
imputing students' "choice sets." The section concludes with a dis- 
cussion of' some of the problems they encountered. 

THE FO RMAL STRUCTURE ^ 

The conditional logit approach is predicated on the assumptions 
that the alternative an individual chooses is preferred to all other 
alternatives available to him and that his preferences can be expressed 
in the form of a function defined over the attributes of alternatives. 
Formally, let be the set of mutually exclusive alternatives available 

to the ±th student; let X, be'his characteristics;' let Z.. be the jtfz 

1 o . ij 

alternative'^ attributes with respect to him: and let U.(Z..) be a 
scalar-valued measure of his* preference for the jth alternative. He^ 
is assumed to choose th^ jifc?2 alternative if and only if U.(Z..) ;^U.(Z.,) 
for all k in C^. If differences among infifividuais' preferences> for a' « 
given set of attributes have a random comi>onent j » the ±th individual's 
preference for the jifc?2 alternative carl be wrl,tten U,(X_^,- Z^^,.e^j)- 

For reasons of tractability , it is necessary to assume that U is 
linear in parameters with an additive disturbance: 

where V is a vector valued function, 6 is the vector of parameters to 
be estimated, and e,. is a scalar random. variable. The choice of 
alternative j itnples: 



V(X., Z. • e + e. . k V(X. , Z. J • 6 + , for all k e C: , 
^ i' ij ^ ij . i' ik ik ' 1 ' 



This subsection summarizes the discussion provided by Kohn, Manski, 
and Mundel [1] of the conditional logit analysis technique. 

ERIC/ . . • / 
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■ or equivalently , ' • 

(V(X^, 2^^) - v(X^. Z^^)) ' Q^e^^- e^. , for allb k e . (2) 

In order to estimate the parameters of (2) j it^ is Vhecessary to - 
specify the joint .probability distribution of the e.. . A probability 
distribution that leads . to a tractablfe likelihood function is the 
W^ibull distribution: C[N 

- • ' * , -ae"^'^ 

Prob (e.:S T) = e , ct > 0 , 3 > 0 . ' 

If ^nd are independent and identically distributed with this 
distribution, it can be shown that 

Prob (j chosen from C^) . . * 

» Prob (£^j^ - e^_. ^ (v(X^, Z.^) - V(X^, Z.j^)) - 9 , for all k e C.) 



^ ^ec.Mi (-3(v(x., z..) - v(x., z.^)) . 9) •■ 



The likelihood of the observed choices made by a set of n individuals 

is ■ • ■ ' 

* ■ * . . ' 

. ■ n 

L(B, 9) = n Prob (f. chosen from C.) , (4) 
i=l ^ ^ ' . 

■> . ■ 

where is the ±th individual's choice. 

Function optimization procedures can be used to determine the max- 
imum likelihood estimates of the product 60. Knowledge of 6 up to this . 
multiple is sufficient for all applications. 

11 
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PREVIOUS STUDIES • 



Data 



Radner and Miller (KM) 'and Kohn, Manski, and^Mundel (KMM) use the 
SCOPE 1966 survey of high school seniors.^ The sutvey includes approx- 
'imately 34,000 students in 305 public and private high schools ifii' f our 
states — California, Illinois, Massachusetts, and North Carolina, The 
baseline data obtained include personal "and family characteristics, 
postsecondary aspirations and expectations, plans for postsecondary 
education, and sources of funds for college expenses. The Academic ' 
Ability Test (AAT) , similar to the Scholastic Aptitude Test (SAT), was. 
given to most of the students. Both KMM and KM convert \AAT scopes to 
the equival-ant SAT scores. ^ 

In spring *1967, the SCOPE researchers attempted to "locate"^ the 
students who , had gone' on to college. The institutions each student 
had listed as his first or second college^choice (in the baseline sur- 
vey) and the junior college nearest his home were queried. Students 
were sent postcards requesting 'information on their current activities, 
and their high school. coutfeelors were asked if they knew where. the 
students ha"d gohe. In air,- a collegiate enrollment of 17,i99 students 
was established. It was assumed that the"l6,741 ^students not "located" 
at a college had not gone on td- college.^ 

Responses to follow-up surveys were obtained from"' 10,581 college- 
going students,- 8,683 'parents of college-going students,' and 3,0l4 

' 8 

parents of students who had not gone on to college. The follow-up 
•data included students postsecondary activities and, if f \hey ; had ^gon^ 
on to college, their expenses and sources of funds. Parents were asked 
fo provide their 1966 family income. 



^High school freshmen were also surveyed in 1966^ and followed for ^ 

four years, but neither RM nor KMM used that part of the data base. 

7 ' . ' 

While many '"nongoers" were positively identified (by their response 

to the fbllow-up postcard). It is likely that some college-going students 

are included among* thdm. The data set doeS not distinguish between known ^ 

nongoers and students never located. 

8 

The numbers of students and parents to whotn follow-up efforts ^were 
directed have not been published; response rates to the follow-up sur- 
veys are unlmown , . ' " 

1 ?> J 



Nonresponses and "don't know" responses, to the family income 
question on the student baseline instrument were frequent. Moreover, 
RM (3] examine the cases where a student (on the 19.66 baseline ques- 
tionnaire) and his faMly (on the 1967 parent questionnaire) provided 
independent (1966) family income estimates and found substantial dis- 
crepancies between the two. Assuming that parent-reported income is 
more accurate, both RM and KMM developed income prediction equations 
by regressing parent reported, family income on students' responses on 
parental education, job status, occupation, and income. 

KMM obtained most of their data on institutional attributes from 
the 19156 Institutional Domain File compiled by ^the American Council on 
Education [9]. This file provi^les information on the tuition and fe^s, 
faculty, programs, student characteristics, financial aid, e^tc, of 
colleges and universities. To obtain admeasure of the distance between 
a student's home and a college, ^KMM coded the latitude and longitude 
of SCOPE high schools and of colleges and universities and computed the 
, straight-line distance, in miles, between each high 'School/college pair 
RM compiled data on institutions' attributes from research reports 
institution catalogues, or direct- correspondence . Ins tead 'of using ^a 
distance measure, RM inspected road maps and classified an institution 
as being withincommutini distance of a studeut^if it appeared possible 
to drive fromsthe student's high school to the "institution within 50 
minutes. ;/ ' , 

t ■ ' ■ ■ 

Models' . • • 

m's choice model focused on two variables: the ratio of cost to.-, 
family ihcome and the product of the student's ability (his SAT score) 
and the college's quality (the average SAT score of freshmen attending 
the institution).^ They assumed that the "cofet" ofv^^not going on to 
college was zero and that the "quality" of the "no-go" option wa>5 'the • 



9 

RM defined the cost of attending^ an institution within commuting 
distance to be /tuition plus $100 (bocks and supplies) plus $180 (trans- 
portation costs). If the institution was beyond cominuting distaiice/ 
they defined cost as tuition plus $100 (books and supplies^) pltis $180 
(miscellaneous costs, of living away from home) plus the appr;oximate 
price of a round trip air fare plus $900 (room and bof|rd>. " ' 



average SAT score of the California SCOPE students who had net gone on' 
to college. 

KMM modeled students' decisions as a two-stage process. In the 
first stage, ^each student evaluates the collegiate alternatives avail- 
able to him and identifies the most preferred^ This' evaluation is"^ 
^s^jned tb depend on some 15 variables; ^ition, tuition squared, dis- 
J^tance-, roQin artd board fees, the average SAT score of the students 

attending tKe^^^^(^^€ge , the squared difference between the student's SAT 
score and the average^ SAT score of the studertits attending the college, 
the college's revenues per student, the numWr of different areas in • 
which the college has degree-granting programs, the percentage of 
students residing on campus, an indicator of single, sex institutions, 
and a series of dummy variables indicating college type — private four- 
year college, private two-year college, pu^.iric' university, public four-^ 
year college, and public two-year college. In the second stage, the 
student decides whether the.mO^t preferred college alternative is suf- 
ficiently attractive to induce Mm to enroll. This evaluation depends 
on father's education, mother's'education, sex, arid the. highest prefer- 
ence "score^* imputed to any college in the student's choice set.''""'" 



' Imputing the Choice Set ' ' . , ^ ^ 

In principle, each student had the option of enro]^ling at any 
college or university that would accep^t him. And, in 1966, there were 
over 2,300 institutions of higher education in the country, many of 
which- .were not selective. Even the academically weak SCOPE students, 
^couid have gained admission to literally hundred^ of institutions. 
^ Computational constralpts, however, preclude analysis with choice sets 

"^^KMM developed a separate '^commuter choice" model to predict 

whether or not a student would commute to a college. If the prediction 

was to commute, distance was set equal to the number of miles between 

home and college; for these students, the room and board variable was 

se^t equal to zero. If the prediction was to reside , distance was set 

equal to ?;ero and the college's dormitory fee was used for room and 
board, ' . 

-11 . : • ■ 

The college a student attended, is included in his choice set; 

but i-f the preference sco^e imputed to some 'other college exceeds the 

imputed preference score of his chosen college, the^^igher score is 

used as the measure. 



of this magnitude. Thus, both FM and KMM had to devise procedures for 
imputing a choice set of manageable size for each student. 

, RM argue that the alternatives confronting any student can be 
clustered into ten basic groups. The first corresponds to the "no-go" 
option; the remaining nine correspond to institutions falling into var- 
ious cost-by-quality categories. Table 1 summarizes the kinds of in- 
stitutions they assign to each category, • . 

Table 1 

RADNTR-MILLER CHOICE SETS: COST AND 
OL'AtlTY CATEGORIES OF INSTITUTIONS 





Low Cost 
Category 
{Less than 
$600) 


Medium Cost 

Category 
($600-$2250) 


^ High Cost 
Category 
($2250+) 


Low 
(Less than 


Hubiic 2-yr 
colleges within 
commuting dis- 
*tance 


Trade schools and 
private^ 2-yr 
colleges within 
commuting distance 


Private colleges 
and universities 


Mediura 
(^80-540) 


Pi^lic 4-yr 
■ colleges within 
conunuting dis- 
tance 


Public 4-yr , 
colleges beyond 
commuting distance - 
and low-tul,tion 
private colleges 
within commuting 
distance 


Private colleges 
and universities 


High 
(340+) 


Public univer-. 
si Lies within 
conunuting 
distance^ 


Public universities 
not within com- 
muting distance 


Private colleges 
and universities,: 


SOURCE: 


Radner and Miiler.fa], 


p. 43. 





Pleasure of quality = average 'SAT score of all students attending 
the institution. . 

\ 



For each student, RM identified all institutions that would Jiavo 

12 

admitted him, had he /ipplied. They then calculated the average cost 

RM consulted high school counselors, col lege catalogues , admis- 
sions' of ficers , and state officials to obtain estimates of tlie minimum 
SAT score required for entrance to the public institutions in each slntt 
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and quality of the "available" institutions in each category. If a 
student wt^nt on to college, the cost and quality attributes of the 
institution he attended were substituted for the average attributes 
of the institutions in its category. Each student's choice set thus 
comprised the "no-go" option, the institution he attended (if he went 
on to college), and eight (nine if he did not go on to college) "repre- 
st?ntative" inst i^tutions whose attributes were the mean values of the 
attributes of the institutions available to him in the corresponding 
cost/quality category . ''"^ 

KMM constructed each college-going student's choice set by randomly 
selecting institutions located within 200 miles of the student's high 
school and applying an admissions model to determine whether or not it 
.was available to the student, Single sex colleges serving the oppo- 
site sex and colleges located more than 60 miles ffom the student which 
lacked residency 'facilities were rejected, The^^ocess was continued 
until ten "available" institutions were identified or^ until the set of 
institutions within 200 irtiles was exhausted. The institution actually 
attended was added to the ten, or fewer, colleges so. identified to form 
the student's choice set, ' , . 

■k^lTAT^ONS O F THE . APPROACH ' ■ ' 

^ - ' * ' • . 

Q_^_Q^C(?. of Choice Set . • , 

The conditional logit approach requires that each -student 's choice 

and. estimated" an "admissions model" for each l)f ..400 private institu- 
tions. They assumed that an institution would admit a student w,hoSe 
SAT score exceeded the score estimated to yield a 50 percent admission 
probability,^ 
13 

RM do not mention weight^; they presumably used unweighted mean 
cost, and quality measures to represent the institutions in a cost/quality 
group. 

14 . . 

Unlike RM, who constructed separate models for each institution, 
KMM estimated a single, albeit more detailed, admissions model for all 
Institutions, In constructing students' choice sets, KMm . estimated the 
probability that the student would be admitted to a (randomly selected) 
college. Rejecting schools for which admission^ probability was less 
than .25j they generated a random number on the unit interval and in- 
cluded the institution in the choice set if the random number was less 
than the estimated admissions probability. 
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set be complGtely specified. This forced both RM and KMM to develop 
a number of peripheral data imputation models relating to choice sets 
and admission criteria at the individual student level. These pro- 
cedures proved to be very costly. Both RM and KMM had intended to 
examine the entire SCOPE sample, but had to cut back substantially on 
the number of students. RM eventually concentrated their analysis on 
two subsamples, each including about 375 of the roughly 34,000 SCOPE 
students. And KMM could examine only the students in Illinois and 
North Carolina. 

The data so laboriously constructed are of little independent in- 
terest. Estimates of students' choice sets, institutions' admissions 
patterns, and students' residency/cdmmuter choices are of value only as 
input to the estimation of the conditional logit parameters. The accu- 
racy of the imputed data is also open to question. The KMm procedure 
for imputing choice sets is based on the implausible assumption that 
evexy institution within 200 miles of a student's high school is equally 
likely to have been considered. And their approach to estimating a 
student's admissibility to an institution clearly leads to imputation' 
errors— an institution is included in the student's choice set when he 
would not have been admitted there, and conversely. 

RM avoid the proble^n of identifying, the specific institutions a 
student considered by assuming that the student chooses among , "repre- 
sentative" institutions whose attributes are the mean values of the ^ 
attributes of ' institutions , in various categories.' They further stratify 
institutions by the attributes which enter the model (cost and quality), 
^nsuring that the within-category variance of the variables is small 
and that each category's "representative" institution is similar to 
other institutions within its category. Since the mean attributes 
within a category are somewhat insensitive to the inclusion Or exclu- 
sion of any particular institution, the accuracy of their admissions 
models is less critical. But this procedure, is impractical if the 
variables in the model depend on more than two or three institutional 
attributes. As the number of institutional attributes included in the 
•model is increased, one must expand the stratification scheme (vastly 
increasing computation costs) or enhance the risk of imputation errors 

17 



(differences between the attributes of the institutions a student con- 
sidered and the mean attributes of the institutions in the various 
categories). 

Computational Problems 

The maximum likelihood procedures used to estimate the parameters 
of a conditional logit model are very expensive, limiting the extent to 
which alternative functional forms or specifications of variables can 
be explored within the research budget. One of KMM's college choice 
runs, for example, required 840 CPU seconds on an IBM 370/168 to esti- 
mate the parameters of a 10-variable specification for about 3,100 
students having about 30, 200 choices . -"-^ Another run to estimate a 
20-variable specification of their go/no-go model for about 7,lOO 
students required 1,040 CPy seconds. 

This limitation is particularly apparent in RM's work. Beyond the 
variables which entered their model (institutional cost and quality, 
and student income and ability), they^ wished to expj^ore the ^fluence 
of some 21 additional student variables"^^ on students' college-going 
rates and , patterns . The natural approaches to the problem — estimating 
alternative specifications of the model which incorporated the addi- 
tional variables and testing their ^significance, or stratifying the 
students by levels of the variables and fitting the model for each 
strata — were precluded by the prohibitive costs (and small cell sizes) . 
Instead, RM used their basic modei to predict the distributions of 
students, stratified by the variables ta be explored, among postsecond- 
ary outcomes. These distributions were then compare^ to the students* 
actual di55tributions to discover whethe^ "improved" predictions were 
obtained by taking account of differences among students in terms of 
the variables. The computational limitations of the maximum likelihood 
approach thus imposed an extremely cumbersome approach to the explora- 
tion of alternative specifications of the model. 



Each student's choice set included his chosen college arid 10 (or 
fewer) imputed alternatives. '~ ^ . 

■^^^Student Vs sex and various measures of student's attitudes, as- 
pirations, and expectations. See [2, p. 51] for a list of variables. 
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Problems of Omitted Variables 

^— - ' — ' ■■ ■ ■ " ■<? 

The formulation of the 'conditional logit model in terms of indi- 
viduals' preferences limits the analysis to variables that have a 
behavioral interpretation. Institutional size, for example, does not 
readily fit in unless one contends that the differences in sizes of ^ 
institutions reflect differences ,in the perceived utilities of size to 
potential students. Neither RM nor KMM were willing to do that; both 
implicitly assume that institutions are large or small only because 
their other attributes are relatively attractive to many or few stu- 
dents. But size is important; it reflects a number of institutional 
attributes, some of whicli cannot easily be measured: academic reputa- 
tion, capacity constraints, recruiting efforts, quality of football 
teams, climate, recreational facilities, proximity to population centers, 
etc. Thus, there is reason to believe that the KMM and RM lists of 
behavioral variables are incomplete, and that the fit-ting process has 
compensated by putting larger (smaller) coefficients on those variables 
positively (negatively) correlated with size. . 



19 



-13- 



III> A BAYESIAN ALTERNATIVE TO CONDITIONAL LOGIT 



Bayes' Theorem provides an alternative approach to the problems 
of modeling individuals' choices which, we contend , alleviates many of 
the problems discussed above. This section develops a general theory 
for estimating the probability P(j|l) that individual i chooses insti- \ 
tution j. We then summarize our empirical approach to estimating the. ^ 
•distribution of student characteristics, deferring detailed discussion 
to Sec. IV. We show how student choice probabilities can be derived 
from these empirical results, and conclude with a discussion of the 
advantages of the approach. , . * ^ 

THE FORMAL STRU CTURE " .... 

As above, let denote the iifc/f individual' s vector of character- ' 
istics, Z^^ the jth institution's vector of attributes with respect to 
. the ±th individual. Our goal is to obtain a convenient parameterization 
for P(i|i) in terms of X. and Z... 

We model X^ as a transformed multivariate nortnal vector with mean 
U = U<Z. , ) and^covariance matrix E. . - E(Z. ,)* Thus, our basic as- 
sumption is that ' 

{T(X )|Z } ~N(U .-, S ) , 

J ■'-J . -'-J 5 

where T is a real-valued vector function. Letting 

= T(X,) , • . . 

we note then that P(i|j) can be replaced in Bayes' formula by the 
function - • 



f(Y,jz..) cc |Z..r^/^ exp {-1/2CY. - U . . ) ^ S.J (Y. - u..)} . , (5) 



9 

^ 1/ 
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A slightly more general class of models is obtained by assuming 
that may be broken Into subcomponents Y^^ and Y2^. Y^^ is assumed 
to be multivariate nermal, given Y and Z.., with a mean vector and 
covariance -matrix that depends in an unspecified manner on Y and Z • 
Y^^ IS assumed to be multivariate normal, its parameters dependent 
only on Z_. . The function f might then be factorized as follows: 

.. f(Y.|z..) = f.CY^jY^^, Z..) f^CY^JZ..) , (6) 

where and are multivariate normal densities, as in Eq. (5). 

DISTR IBUTION OF STUDENT CHAKACTERISTICS / . 

In our empirical work, we investigated probability distributioris^y^- 
whose densities f could be factored as in Eq. f6) . We tried transfor-' 
matiqns T that were simple, conditioned on location of high school (Y^) 
in modeling other student characteristics (Y^) , and assumed that m^eans 
y _^.were linear in institutional attributes and* tha't covariance matrices 
were constant within groups of institutions. Thus,' we were able to • 
estimate parameters of the distributions of characteristics using ordi- 
nary linear regression. _ 

We -confined our .^attention to students who went on to college. 
Although the theory could just as easily have handled the nongoers as ^ 
an additional category, we felt that it would lighten pur load consider- 
ably to omit them and that it would still be possible to make direct 
comparisons with other studies. 

Since our objeGtive was to^s obtain the probability distribution of 
characteristics, it seemed practical (and prudent) to choose only a few 
important ones. KMM and RM stressed the importance of such student 
characteristics as ability, family income, and location of high school. 
Similarly, they focused on a small subset of institutional attributes: 
type of institution (public or private, two- or four-year), cost and 
location. These variables were available in the SCOPE and Institutional 
Dbmain File data bases- ^ 

f^^^^^^^^^S the parameters of the distribution of student char- 
acteristic^s, we concluded with a simple model in which students were 
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stratified by state of residence and sex: eight categories in all. 
Withi-h strata, student ability (measured by the sum of verbal and 
mathematical AAT test scores) was regressed on institutional quality 
(measured by the mean 5AT test scores of students attending the insti- 
tution); the logarithm of family income was regressed on the estimated 
cost of attending the institution; and the logarithm of the distance 
between the student's home and institution was regressad on a constant. 
We examined the residuals from these regressions to verify that they 
were approximately normally distributed. 

In constructing f, we let f^ be the conditional distribution of 
ability and log income, given location o£ the student's home; f^, the ' 
distribution of log distance. The y . 's were obtained, from the regres- 
sion equations. The Z^_.-'s were taken as the sample covariance matrices 

of residuals within state of residence, sex, and certain categories of 

.17 ' • 

institutions. 

STUDENT CHOICE PROBABI LITIES 

The problem is now to predict the institution an individual will 
.choose, based on his vector of characteristics Y^. Assume for the 
moment that there a^e K. institutions" on his list, which might include 
all institutions in the nation, or^simply all institutions within a 
given distance radius. If we let P(j|i) be the probability that stu-* : 
dent i chooses instittition j , Bayes ' Theorem yields 



K 

v^^(jli) ='P(j)f(Yjz.^y/ I P(k)f(Y.iz.j^) , . . (7)' 

Ic 1 



where f is as above, and P(k) is the prior probability of choosing 
institution k. - : q 

■^'^We observed that the dispersion of residuals for California two- 
year public and Massachusetts high-cost private institutions differed - 
from the state-wide pattern.; in these cases, we used their specific 4 
sample covariance matrices.*^ 
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We took the prior probability P(k) to be proportional to the size 
■ 18 

of the frevshman class by sex. We felt that this was the best analysis 
independent indicator of institutions' relative abilities to attract and 
absorb a student. It controls for an institution's capacity constraints 
At the same time', it reflects the several factors (academic reputation, 
recruiting efforts, etc.) tnat affect student choices for which data are 
not available. • . . 

FEATURES OF THE APPROAC H * 

Thus, a formula for the probability of an individual choosing a 
specific institution has been derived. According to ^Eqs. (5) and (6), 
the class of models is quit^ rich. And, unlike earlier models, this 
formula utilizes information not directly related to preference: /in~ 
stitutUonal sizej for example. 

The approach succeeds in placing the task of modeling back into 
the familiar framework of ordinary linear regression, translating the 
problem of predicting choice into the prQblem..of predicting character- 
xstics. Thus, it is possible to utilize many of the impoi:tant and 
familiar features of the linear model, including the ability to look 
at several different regressions based on one accumulation, the ability 
to test hypotheses about the effects of groups of variables, and the 
ability to examine lack of fit via residual plots. Computational costs 
are alsb orders of magnitude low^^r. " 

But the most important feature of the model is that it avoids the 
fundamental problem of imputing each student's choice set. Here, the 
alternative institutions only enter in defining independent variables. 
Thus, if the institution was not considered—and hence its particular 
attributes were unimportant — ^"the corresponding independent variables 
will be expected to have coefficients close to zero. As'^an example. 

The Institutional Domain File provided these data for two-year 
iiolleges for the prior year, 1965. For other institutions, however, 
the data pertained to 1967, the year after the SCOPE students matricu- 
lated. Since the SCOPE students comprised only a small fraction, of 
total .enrollments in 1966, we assiimed that 1967 enrollments were inde- 
pendent of SCOPE students' choices and, thus, that they could be used 
in B^Js* formula. 
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it might be reasonable to suppose that a high ability student with a 
public university neafby would be less likely to enter a two-year college 
than a similar student with no public university nearby. If true, the 
ability of a student at a junior college will depend on the presence or 
absence of a public university near his home; that hypothesis could be 
investigated by including in the ability prediction equation the appro- 
priate indicator variable. \- ^ 
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IV. MODEL ESTIMATION 



In this section, we provide detail'fe of our empirical analysis of 
the distribution of student characteri'stics from the SCOPE and Insti- 
tutional Domain File data bases. . • 

CONSTRUCT ING THE DATA BASE 

— ^ ^ — 

Based on the studies cited earlier, we assumed that student ability, 
family income, and location of residence were the important student 
characteristics; that institutional type, cost, and location were the - 
important institutional attributes. In all, we were able to obtain com- 
plete records for some 14,851 of the oi:iginal cases. Below, we describe 
briefly how the variables were constructed. / . 

Student Ability ^ • . , v 

stoPE used' the staridardized achievement test (AAT) to obtain meas— - 
ures of student verbal and math achievement. Most students in. the SCOPE . 
samplet took the test; we excluded those who; did not take both parts. 
Initially, we tre-ted the verbal and math scores , separately , but we 
found' no useful information in their, joint;, distribution. In the end, 
we used the sum of the ^two test scores as a single measu^re of ability. 

Student's Family Income 

' We used the^ RM procedure for imputing family income; truncating ' 
their estimates to the interval $5,000 to $25,000. J''^ This specification 
was broad enough so that an income figure could be imputed for all ' . 
records. V . ^ 

Student Residency Location „ . > 

We obtained high school latitude and longitude for all but one high 
schdol. We Reasoned that this would be a satisfactory approximation of 
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RM fit .'a linear regression model wl)ich we believe gave poor* esti- 
mates at. the extremes. , 



students' places of residence. The exception, .of course, would be 
students whose families had moved; but we had no information about 
movers, and we felt that^tLir number would not be iar^e enough to 
have a major impact on bur results. 

jj^^JLL"jAonal Quality " * . ' . 

The^ Institutional Domain* File contains the average Scholastic 
Aptitude Test score (math plus verbal) for students at each institu- ■ 
tion. ^Following KJfi.and RM, we use this as the measuVe of institutional 
quality. , ' 

Institutional Cost \ ; . - 

,j^e estimate ' institutional cost' as follows : - ' ' . 

[TUITION] + [ROOM & BOARD] + [COMMUTATION COSTS] . " 

* . . ' . u 

^Tuitions iit piublic sector institutions^were obtained from college 
catalogues; tui4jions for the private institlfepns were obtained from 
the Institutional Doitiain File. ' 

Room 'and Board was assumed to be zero if .^he institlition was with- 
in 30 miles ^of his home; equating 30 .miles with 50- minutes driving time 
would make this cbnsistejit T;;rith the RM study i For institutions farther 
away, we u'sed the room and board fee provided by the . Institutional 
Domain File if available; otherwise, we used the national average room 
and board, fee of ,$972 for public institutions, $1,140 for private 
institutions. , 

Gommutatzon Gosts, again taken from RM^ were assumed to be $180^;^. 
for institutions within 30 miles of a student's home; zero otherwise. A 

• ■ ' • • „ *■ 

ESTIMAT.I NG THE DISTRIBUTION OF STUDENT CHARACTERISTICS 

Spurred on by what we thought was- a rather large da'ta base, we 
initially posed models of student characteristics that were rich in 
parameters, conditioning on a lar^ number of aTspects of the stc . 's 
institution and home environments^. The richer models tended to yield" 
inconsistencies, usually in the form of counter-intuitive signs on ' 
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regression coefficients in certain stpULQ,,x^§ the SCOPE population. 
Our response was generally to look at simpler models that yielded 
plausible results., and the final .equations, obtained after systemati- 
cally eliminating the spurious fits, are fairly parsimonious. Only 
these final results are reported. 

We began by stratifying the SCOPE population into the eight groups: 
state of residence by ^sex. Within each group, we conditioned on the 
location of high school and choice of institution, and attempted to model 
'the student's joint ability and income distribution; then, conditioning 
only on the choice df institution, we attempted to model the location of 
■students' high schools* v 

We divided th6 institutions available to a given student into "five 
types: (1) public two--year colleges, (2) public 'four-year colleges, 
(3) public universities, (4) low-cost (tuition :S $1,000) private insti- 
tutions, and (5) high-cost (tuition > $1,000) private institutions. We 
reasoned that the regression coefficients on institutional attributes 
would be likely to depend on some categorization such as this,/and in 
forming our models, we interacted , them separately with the various inde- 
pendent variables. 

Ability ... ' ^ ■ 

Table 2 shows the results of the ability regressions. The equations 
have institutional type main effects and quality by institutional type 
interactions. We note that there-is significant variation in the coef- 
ficients within each equation: tests for the importance of the main 
effects and, for the institutional ' quality interactions showed these terms 
to be significant. And,' where coefficients of institutional quality are • 
significantly different from zero,, they generally have the right (posi- 
.tive) sign, consistent with higher quality schools attracting higher 
ability students. 

Of course, in the present circumstances, it is very important to 
investigc-ite whether the distribution of the residuals is normal—this 
would be a necessary condition for the distribution pf student charac- 
teristics to be multivariate normal. Thus, we obtained a random sample ' 
of 200 observations and plotted residuals separately against predicted 
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values, institutional type, and , quality to look for departures from 
the homoscedast ic patterns; we also looked at normal probability 
plots of the residuals. We concluded in all cases that the residuals 
looked fairly normal, but in two instances (California public two-year 
colleges, Massachusetts private high-cost institutions) the spread of 
the residuals fqr both males and females was larger than for the rest 
of the state; In these casfes, we cHose to fit separate variance terms 
to the ability residuals, 

Im:ojne 

We observed by looking at probability plots of various income re- 
gressions that the normal assumptions would be seriously violated unless 
income were transformed, Tli'e logarithmic transformation seemed to work 
reasonably well; we ended up using it exclusively throughout. 

Table 3 provides the results of regressing log (income) on insti- 
tutional type and institutional type interacted with cost. The .coef- 
ficients of cost generally had the correct sign: where significant, 
they suggested that higher income students attended the more expensive 
schools. We found, however, that knowing institutional cost did not 
reduce the variance of log income by a large amount. 

The normal probability plots of the log (income) residuals showed 
this variable to be approximately normal. It also appeared that the 
spread of the residuals was independent of the various independent 
variables. 

Joint Distribution of Ability and Income 

A final step in characterizing the distribution of these quantities 
was to investigate their joint distribution. The basic requirements for 
the use of Eqs. (5) and (6) (Sec. Ill) is that the residuals' of the pre- 
vious regressions should appear to. have a bivariate normal distribution. 
We.. looked at scatterplots of ability And log (income) residuals within 
state and sex to see if, in fact, they formed an elliptical ^^t^Prl^- 
We o'bserved no obvious violations in these scatterplots and concluded 
that the multivariate normal assumption for ability and log (income) was 
reasonably consistent with the data. . . 
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Location of High School 

We assume that the distribution of the location of high\school was 
a function of the distance between the high school and institution. 
As with income, distance was transformed .to logarithms; then, a simple 
model was fit including only dummy variables for institutional type. 
The equations are shown in Table 4.* Our search for heteroscedast ic or 
nonnormal patterns in the residuals proved negative, and^e concluded 
that the normality assumptions were approximately true. 

Table 4 ' 

RESULTS OF STUDENT LOG (DISTANCE) REGRESSIONS 











. Regression Coefficients 












Type 


of Institution Dumra^ 


7 


Subsample 


Sample 
Size 


R^ 


Estimated 

Standard 

Deviation 


Constant 
Terra 


Public 
2-4^ r 


Public 
4-yr 


Public 
University 


Private 
(Tuition 
± $1000) 


Ca] if omia 


















Males 


2133 


0.42 


0.6139 


1.910 
(38.6)^ 


-1.^46 
(25.9) 


-0.497 
(8.0) 


-0.314 - 

(5.0): 


0.402 
(3.6) 


Females 


1898 


6.41 


. '0.6763 


1.955 
(39.5) 


-1.359 
(25.4) 


-0.517 
(7.7) 


-0.388 
(6.2) . 


0.291 
(3.1) 


Illinois 


















Males 


2209 


0.38 


0.6516 


1.840 
(62.6) 


-1.173 
(29.1) 


-0.045 

(9-9) 


0.40 
(1.0) 


-0.024 
(0.5) ' 


Females 


1810 


0.40 


0.6455 


1.919 
' (63.5) ^ 


-1.366 
(31.3) 


-0.262 
(5.6) 


-0.196 
(4.6) 


-0.061 
(1.0) 


Massachusetts. 


















Males 

X 


1700 


0.16 


0.7351 


1.504 
(56.2) 


-0.682 
(13.-5) 


-0.385 
(7.9) 


0.360 
(6.5) 


-0.121 
(1.5) 


Females 


1255 


0.24 


0.6858 


1.703 
(57.3) 


-0.981 
(17.0) 


-0.609 
(12.2) 


0.053 
(0.8) 


-0.191 
(2.3) . 


North Carolina 


















Males 


1989 


0.33 


0.6603 


2.095 
(32.8) 


-1.461 
(19.8) 


-0.508 
(7.3) . 


-0.025 
(0.4) 


-0.586 


Females 


1857 


0.20 


0.6710 


1.993 
(33.7) 


-1.251 
(16.6) 


-0 . 312 
(4.9) 


0;^125 

(1.3) 


-0.473 
(7.3) 



t statistics are shown in parentheses. 
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V. PREDICTIVE POWER 



This section reviews RM's tests of the predictive accuracy of 
their model, reports the results of a similar test of ours, and com- 
pares the two sets of results. . / 

r 

RM^S SIMUtATIQNS 

m drew two samples of students:, Sample I consists of 369 stu- ^ 
dents whose parents had not responded to the family income question; 
Sample II consists of 375 students whose parents had reported family 
income. Each sample contains approximately equal ' numbers of students 
from each state.' They further divided each sample by student test 
scores into four ability groups. Then they used estimated family in- 
come in all Sample. I analyses, but performed Sample II analyses separ- 
ately using parent reported income (IIA) and estimated income (IIB) . 

RM estimate the parameters of their model separately for each of 

the 12 cases (four ability groups by Samples I, IIA, and IIB)* They 

calculate the probability that each student will choose each option in 
21 

his choice set. The probabilities are summed by 'option to obtain the 
predicted distributipn of students among options. 

To facilitate comparisons with our results » we eliminated predicted 
and actual nongoefs, and rescaled the predicted and actual distributions 
of college goers to sum to one. RM's rescaled results are displayed in 
Tables 5 and 6 . ^ 

BAYESIAN SIMULATIONS 

Our model can be used to predict the distribution of students over 
all colleges in the country. However, the predicted probability that a 
student will attend any particular college rapidly declines with distance. 



ERIC 



20 . ; 

KMM did not provide a test of the predictive accuracy of their 

model. 
21 

Recall,' in RM's formulation, that a student's choice set consists 
of not going on to college or attending one of nine "representative" 
institutions., each of which offers the mean attributes of the institu- 
tions in a cost by quality cat-egory. 

32 
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We felt that a simulation of students* choices among the institutions 
"near" his home would- lead to. reasonably accurate predictions and would 
much less expensive; arbitrarily, we chose a 50 mile, boundary. 

We use Eqs, (5) through (7) to calculate the probability that each 
Student would attend each institution located within 50 miles of his 
high school; the 155 students with no institution within 50 miles were 
. deleted. We then stratified the institutions into RM's nine quality/ 
cost categories, and summed the estimated probabilities over institu- 
tions in each category. Finally, we counted the actual number of 
students in each category, regardless of whether they attended an in- 
stitution within^ 50 miles. \ . 

Table 7 shows the predicted and actual number of* students in each 
, state who attended institutions in each of the nine categories. We 
then stratified the students by the RM ability criteria and summed over 
-States to obtain the predicted and actual number of students in each 
j^ability group by institutional category. Table 8 presents these data 
^in the format of Tables 5 and 6, .facilitating a comparison of our re- 
*^ults with those of RM. 

'comparison of RESULTS' 

r^. We' used the Gini coefficient [10] to measure- the accuracy of tSie 
' predicted frequency distributions. It is the sum 'of the absolute dif- 
ferences between the predicted and actual frequencies; higher values 
thus imply greater discrepancies between these distributions. Table 9 
provides Gini coefficients for each of the simulations discussed above. 
It is clear ,tliat our predicted distributions are substantially closer 
than RM's to the actual distributions in every case. 

We recognized, however,, that according to t^le law of" large numbers, 
this comparison favored the , Bay es approach: it utilized more than 14,000 
observations whereas RM used fewer than 400. So, we randomly assigned 

each of the 14,696 students in our sample to one of 40 subsamples, and 

22 

replicated the simulation in each case. We computed the Gini coef- 
ficients for the predicted and corresponding actual distributions of 

22 . - 

Subsample sizes ranged from 335 to 405, averaging 367. 

' ■ . >- ... , 
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Table 7 



PREDICTED AND ACTUAL NUMBER OF STUDENTS ATTENDING 
INSTITUTIONS, BY COST/QUALITY CATEGORIES AND BY STATE 











'Cost CateRorv 








Institution 




Low 

(Less than $600 
per year) 


Medium 
($600-$2250 
per year) 


High 
($2250+ 
per year) 


All. 


Category^ 


State 


Predicted 

> 


Actual 


Predicted 


Actual 


Predicted 


Actual 


^ Predicted 


Actiial 


' I-.OW ^ 

(Less than 


Calif . 
111. 


2174 
1581 


2425 
1309 


320 
582 


154 
732 


23 
17 




91 RO 






Mass . 


238 


457 


. 562 


321 


90 


78 


• 889 


856 




N.C. 


905 ' 


642 


1965 


1736 


57 


29 


2927 


2407 




Total 


. 4898 


4833 


3429 


2943. 


187 


' 165 


85.14^' . 


7941 


Medium 
(480-540) 


Calif. 
Ill, 


535 
( 170 


318 
93 


380 
957 


477 

y+D 




■jV 


1 1 A A 
lloo 


so 




Mass . 


37 


52 


. 549 


730 


55 r 


112 


642 


894 




N.C. 


54 


192 


397 


436 


16 


20 


467 


^ 648 




Total . 


^ 796 


655 


2283 


2589 


135 


258 


3214 y 


3502. 


High 
(540+) 


Calif, 


243 


247 


241 


265 


76 


. 90 


561 


602 


111. 


13 


12 


524' 


520 


136 


265 


673 


797 




Mass« 


. 13 


13 


998 


766 


413 


426 


1424 


1205 




N.C, 


0 


' ' 0 


294 * 


539 


.17 


.110 


311 


649 




Total 


'269 


272 


2057 


2090 


. 642 


891 


, ^ 2968 


3253 ■ 


All 


Calif. 


2952 


2990 


941 


89j6 


125 


132 


• 4018 


4018 




Ill, 


1764 • 


1414 


2063 


2X98 


191 


407 


4019 


4019 




Mass. 


288 


522 


2109 


1817 


558 


616 


2955 


2955 




N.C. 


959 


834 


2656 


2711 


90 


159 


3705 


^ 3?04 


> 


Total 


" 5963 


5760 


7769 

{ 


7622 ■ 


. 964 


1312 


14696 

1 


14696 



*McA«ur« of quality « average SAT «core\)f all students attending the institution. 
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Table 9 • < ■ 

COMPARISON OF, STUDY RESULTS.: ABSOLUTE VALUES OF DEVIATIONS 
BETWEEN PREDICTED AND ACTUAL PERCENTAGE DISTRIBUTIONS, BY 

STUDENTS' ABILITY GROUP 





Rad.ner and Miller Study 




Students' 
Ability Group^ 




Sample II 


,-? 




Sample I 


Parent Reported 
Income* 


Estimated 
Income 


Rand Study 


Low 


29 . 0 , 


-24.1 


24.0 


11.0 


Medium low ^ 


• 40.3 


'i 50.2 


118.7 . 


9,6 


Medium high „ 


46.6 


38.9 


37.4 


12.5 


High 


^73.4^ 


'48.7 


5Q.2 ' 


18.3 


All ' . 


36.1 . 


26.2 


; 26,5 


9.7 



^ow - less than 400; medium* low = 400-475; mediirai high = 475-550' 
hj|gh = 550+. ' 



students at each ability level ^and across ability levels. Table 10 
shows the maximum, m^an, niinimum, and standard deviation- of these Gini 
coefficients by student ability level. ; For reference purposes , it also 
shows smallest Gini coefficients for the three' comparable RM predictions. 

Our least accurate prediction,' over 40 samples, is superior^^ to " 
RM's most accurate prediction, over three samples , for stiidents in the 
medium-low, medium-high, and high aBility . groups and across ability 
groups. In the case of low ability students, 34 of our 40 predicted 
distributions were more accurate than RM's most acc|B^te prediction. 



21 



That is, it had a lower' Girii coefficient. 
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' Table 10 

DISTRIBUTION OF ABSOLUTE VALUES OF DEVIATIONS BETWEEN 
PREDICTED AND ACTUAL PERCENTAGE DISTRIBUTIONS FOR 40 
INDEPENDENT SUBSAMPLES, 'BY STUDENTS'. ABILITY GROUP 



Students' 
Abilit;^ 
Group " 




Summary Statistics for 40 


Sub samples 


Maximum 


Mean 


Miiiiraum ' 


Standard 
Deviation 


Lowest Coefficient 
for 3 RM Samples 


Low 


32.3 


17.2 


7.8 


6.5- 


24.0 


Medium low 


35.6 


20.7 


9.7 


6.4 


40.3 ' 


Medium high 


37.2 


23.5 

■ 


10.8 




' 37.4 


High 


47-8. 


:. 28.9 


10.7 


7.6 


48,7 


All 


19". 4 


13.6 


7.6 

i . — - 


3.4. 


26.2 



\ow » le38 than 400; medium low =^ 400-475; medium high 475-550; 
high =- 5504-. 
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VI. CONCLUDING REMA RKS * 

' ' ' ' . ■ — — — 

We find that our predictions are considerably "closer to the actual 
valuea. than those based on the conditiotial logit" approach. In addition, 
the Bayesian methodology is easier to use, offer's much greyer flexi- 
bility, and is much less expensive- to apply . Thus, we feel that it 
offers considerable advantage over the conditional logit approach in 
the present context. . ^ • 

A number of .recent studies have employed the conditional logit 
approach to model choice behavior in various areas", including education, 
transportation [6], and occupation [7,8]. While, the fikyesian formula- 
tion might not- be superior in all instances, our result^ suggest that 
i-t is a viable alternative. Even in those cases, where the conditiotial 
logit approach might be preferred on theoretical grounds, the Bayesian 
methodology would be a useful adjunct in the exploratory . stages of 
research. o 
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